An efficient quasi-Monte Carlo method with forced fixed detection for photon scatter simulation in CT

Detected scattered photons can cause cupping and streak artifacts, significantly degrading the quality of CT images. For fast and accurate estimation of scatter intensities resulting from photon interactions with a phantom, we first transform the path probability of photons interacting with the phantom into a high-dimensional integral. Secondly, we develope a new efficient algorithm called gQMCFFD, which combines graphics processing unit(GPU)-based quasi-Monte Carlo (QMC) with forced fixed detection to approximate this integral. QMC uses low discrepancy sequences for simulation and is deterministic versions of Monte Carlo. Numerical experiments show that the results are in excellent agreement and the efficiency improvement factors are 4 ∼ 46 times in all simulations by gQMCFFD with comparison to GPU-based Monte Carlo methods. And by combining gQMCFFD with sparse matrix method, the simulation time is reduced to 2 seconds in a single projection angle and the relative difference is 3.53%.


Introduction
Computed tomography (CT) has been widely used in many fields, and has revolutionized diagnostic radiology with its great advantages of non-destructive, non-overlapping images and high-resolution since the first CT was developed in 1972 by Hounsfield [1,2].
The process of CT image reconstruction produces an estimate of the linear attenuation coefficient μ tot (x, E) of the phantom M from data obtained by measuring the attenuation of Xrays along multiple paths through M, where x is the spatial coordinates and E is the photon energy. Let I 0 (l) and I(l) represent the intensity of the X-ray entering and exiting M along the path l, respectively. Both scatter and primary intensities are recorded by the detector D. An illustration of photon paths in CT is shown in Fig 1. The CT system is composed of X-ray source S, a three-dimensional phantom M and a detector D, which is composed of m × m detector pixels. Photons emitted from the X-ray source S will be absorbed or scattered when passing through the phantom M. After a certain amount of scatter, photons will reach the detector D or not with a certain probability. As shown in Fig 1, the red lines l 1 , l 2 , l 3 are 1-, 2-, 3-order scatter photon paths, respectively, and they reach the detector D. The green lines are the primary intensities which are useful. The gray lines are the other intensities that don't reach the detector D.
The Lambert-Beer law [3] describes the attenuation of X-rays passing through the measured phantom M under ideal conditions, without considering scattered photons that may reach the detector. By the Lambert-Beer law so μ tot (x, E) of each projection data will be underestimated when scatter is present. Scatter intensities induce nonlinear errors in the measurement of μ tot values and have a complex effect on the reconstructed CT images. It may lead to dark steak artifacts between image regions of high attenuation, cup-shaped artifacts in homogeneous objects and the reduction of the contrast resolution of the reconstructed CT images [4]. Furthermore, the contribution of scatter intensity to the total intensity (which is the sum of the scatter and primary intensity) concurrently grows when using high-energy X-ray, or multi-row and flat-panel detectors in CT [5]. Therefore, superior scatter correction algorithms are needed. Numerous methods for scatter calculation and correction have been proposed, such as scatter kernel superposition (SKS) algorithms [6], fast adaptive SKS (fASKS) [7], Monte Carlo (MC) methods [5,[8][9][10][11][12][13], Acuros CTS (deterministic solution of linear Boltzmann transport equation) [14,15], neural network approaches [16,17] and so on. SKS uses scatter point spread functions generated from pencil beams to perform deconvolution on the measured projection data using derived kernels. Although SKS is an approximate method for scatter estimation with advantages such as high computational efficiency and practicality, it suffers from inaccurate estimation of scatter intensities. MC method has been widely applied in radiation therapy, medical imaging and nuclear medicine for solving a variety of radiation transport problems [18]. Many MC-based scatter correction tools have been developed, such as PENELOPE [8], methods of combining MC and fixed forced detection (FFD) [5,9,10], which only consider theoretical interacting photons to make a contribution to the intensity in the fixed detector pixel, and GPU-based MC tools (denoted by MC-GPU [11], gMCDRR [12] and gMMC [13]). MC-GPU, which is a GPU implementation of PENELOPE, is expected to have the same accuracy as standard MC. Compared with MC-GPU, gMCDRR uses the differential cross section (DCS) data to calculate the scatter direction and uses the generalized inverse transform method instead of the acceptance-rejection method to sample the scatter direction. gMMC is a GPU-based Metropolis MC, which uses the Metropolis-Hasting algorithm to sample the whole photon paths from the X-ray source to the detector. However, GPU-based MC tools has a low convergence rate O(N −1/2 ), where N is the number of simulations. When greater accuracy is required, N needed increases rapidly. This causes high computational burden. Addtionally, the existing GPU-based MC tools passively receives the scatter intensity reaching the detector, leading to significant computational effort being spent on simulating those photons that do not contribute to the scatter intensity, therefore reducing the overall computational efficiency. Although gMMC can improve photon utilization, it brings serious computational burden and other new problems. Acuros CTS estimated scatter intensities in X-ray projection data by deterministically solving the linear Boltzmann transport equation (LBTE) and was applied to the clinic for treatment planning for radiotherapy of cancer. Although different scatter estimation and correction approaches have been developed, a standard solution is still being studied.
Quasi-Monte Carlo (QMC) method is a deterministic version of MC method which uses low discrepancy sequences [19,20] instead of random sequences for simulation. Under suitable conditions, the convergence rate of QMC is close to O(N −1+� ), � > 0, which is asymptotically faster than that of MC for a fixed dimension. QMC is usually more efficient than MC for high-dimensional integration. gQMCFRD, which combines QMC with forced random detection (FRD), was proposed recently [21]. The efficiency improvement factors (EIFs) are 27 * 37 times when gQMCFRD is compared to MC-GPU.
The concerns of the paper are to develop and study an efficient scatter estimation and correction mechanism for CT which is suitable for QMC. The contributions of the paper are twofold. Firstly, we transform the problem of simulating the photon scatter in CT into a highdimensional integral. Secondly, a new and efficient scatter estimation algorithm is developed, which combines GPU-based QMC with forced fixed detection (FFD). Numerical results show that a substantial gain in efficiency can be realized for simulating the scatter intensity in CT relative to MC-based algorithms.
The rest of this paper is organized as follows. The formulation of simulating the photon scatter in CT and the algorithm are introduced in Materials and Methods. The results of the scatter intensities in both a homogeneous Al phantom, Bone-Tissue (BT) cylinder, Shepp-Logan (SL) phantom and Head phantom are presented in Numerical Experiments. Finally, the paper is summarized in Conclusions.

Derivation of photon path probability P(l i,j )
The interaction between photons and the phantom is random, which can be organized overall into three steps: (i) photons are emitted from the X-ray source S and move to the phantom M with a certain probability; (ii) interact with M and scatter, possibly multiple times; (iii) escape M and reach the detector D with a certain probability. Therefore, we introduce the derivation of the path probability P(l i,j ) of scattered photons in three steps. An illustration of the scattered photon paths is shown in Fig 2. Source S to phantom M. Let I 0 ðõ 0 Þ be the initial flow intensity of a photon emitted from the X-ray source S moving along the unit directionõ 0 and ϕ(E) is the X-ray energy spectrum distribution. The path length l of the photon in M follows the distribution So the probability P(S ! A 1 ) of the photon from the X-ray source S to the first interaction point A 1 along the unit directionõ 0 is where E 0 is the initial energy of the photon, O E 0 is the value space of the energy spectrum, is the length from A 0 to A 1 along the unit directionõ 0 , which is a random variable, c 0 is the length from A 0 to C 0 along the unit directionõ 0 and Scatter. Each interaction is characterised by the associated linear attenuation coefficient μ tot (x, E), which represents the probability of photon passing through the unit path and interacting with the phantom M, and is a function of both the spatial coordinates x and the energy E. The quantity μ tot (x, E) is the sum of m T 0 ðx; EÞ, m T 1 ðx; EÞ and μ a (x, E), where m T 0 ðx; EÞ is the Compton scatter coefficient, m T 1 ðx; EÞ is the Rayleigh scatter coefficient and μ a (x, E) is the photoelectric coefficient. The details for X-ray interactions can be found in the reference [22]. The . ., n}, n is the highest scatter order) and reaches the next-order interaction point A i is where p T 0 ðA iÀ 1 Þ and p T 1 ðA iÀ 1 Þ are the probabilities of Compton scatter and Rayleigh scatter in and are the probability density function (PDF) of the Compton scatter and Rayleigh scatter polar angle and are normalized linear attenuation coefficients. The quantity E 0 iÀ 1 and E 1 iÀ 1 are the remaining energy at A i−1 after Compton and Rayleigh scatter occurred, respectively.
So the probability P(A 1 ! � � � ! A i |S ! A 1 ) that the photon interacts with M at A i−1 (i = {2, . . ., n}) and reaches the interaction point A i is Interaction point A i to fixed detector pixel D j . The photon will escape from the phantom M and reach the detector D with a certain probability after i-th scatter occurs at A i , i = 1, 2, . . ., n. In order to improve the efficiency, only the photons that reach the fixed detector after interaction are considered. This is called fixed forced detection (FFD) [5,9,10]. As shown in Fig 2, after the i-th interaction occurs at A i , i = 1, 2, . . ., n, the photon reaches the fixed detector pixel D j , j = 1, . . ., m × m, with a certain probability along the forced unit directionõ i;j . The probability is is the area of the detector pixel D j and α i,j represents the angle betweenõ i;j and the normal direction of the detector pixel D j , B i;j ¼ A i þ b i;jõi;j is the intersection of the photon movement alongõ i;j and M. So the probability P(l i,j ) that the photon follows the path l i,j : The total probability accumulated after n-order interaction in the fixed detector pixel D j , j = 1, . . ., m × m, is P n i¼1 Pðl i;j Þ.

gQMCFFD algorithm and implementation
In this section, we give the specific implementation steps for simulating the path probability for scattered photons. QMC and FFD are used in the simulation, we call these simulation steps the gQMCFFD algorithm. We will use Sobol' points [23,24] in our simulation. Sobol' sequences are low-discrepancy sequences with better uniformity than random sequences. Other low-discrepancy sequences, such as Halton sequences [25] and Faure sequences [26], can also be used for simulation. The first component u 1 of the pregenerated 4n-dimensional Sobol' point u = (u 1 , u 2 , . . ., u 4n ) is used to sample the initial energy E 0 from the predetermined energy distribution ϕ(E 0 ) by the Walker's aliasing method [27]. Walker's aliasing method is suitable for sampling discrete data and requires only one variable to be sampled. The distribution is shown in Fig 3. The quantity I 0 ðõ 0 Þ is the initial photon flux along the initial unit directionõ 0 and may be complicated in the solid angle of the photon flux, we use u 2 , u 3 to sample uniformly in [0, 4π]. Then the unit incident directionõ 0 , I 0 (u 2 , u 3 ) and the intersection A 0 of the initial incident X-ray and the phantom M can be calculated. So the photon with weight W 0 = I 0 (u 2 , u 3 ) starts at A 0 and moves alongõ 0 in the phantom M.
The path length t i , i = 1, . . ., n, of a photon from its current position A i−1 to the site of the next interaction A i follows the distribution (see [3]) where So the probability of the photon starting from A i−1 , i = 1, . . ., n, escaping the phantom M along the unit scatter directionõ iÀ 1 is where C iÀ 1 ¼ A iÀ 1 þ c iÀ 1õiÀ 1 , C i−1 is the intersection point of the photon alongõ iÀ 1 on the boundary of the phantom M, and c i−1 is the lengh of A i−1 to C i−1 , as shown in Fig 2. The primary intensity (namely the probability of the photon starting from S escaping the phantom M along the unit incident directionõ 0 and reaching the detector D) is After the (i − 1)-th interaction point A i−1 (i = 2, . . ., n) is sampled, we use the component u 4(i−1)+1 to sample scatter type by importance sampling [21] and use u 4(i−1)+2 and u 4(i−1)+3 to sample the unit scatter directionõ iÀ 1 by the Rational Inverse Transform with Aliasing (RITA) algorithm [21,28]. Importance sampling attempts to give more weight to important outcomes thereby increasing sampling efficiency and RITA is a general numerical algorithm for random sampling from continuous distributions using the inverse-transform method. Therefore, the photon with weight iÀ 1 ÞÞ undergoes the i-th interaction inside M, and the i-th interaction point A i , i = 1, 2, . . ., n, can be calculated by A i ¼ A iÀ 1 þ t iõiÀ 1 , the random variable t i is found by where u 4(i−1)+4 is the (4(i − 1) + 4)-th component of 4n-dimensional Sobol' point u.
So the probability that the photon follows the path l i,j : The components u 1 , u 2 , u 3 , u 4 of u are used to calculate the probability of the path: S ! A 0 ! A 1 . So P(l 1,j ) can be written as Besides, the photon path S !A 0 ! A 1 ! � � � ! A i is a Markov chain, the photon path S ! A 0 ! A 1 is sampled by u 1 , u 2 , u 3 , u 4 , A i−1 ! A i , i = 2, . . ., n, is sampled by u 4(i−1)+1 , u 4(i−1)+2 , u 4(i−1)+3 and u 4(i−1)+4 and A i is only related with A i−1 . So The total probability accumulated after n-order interaction in the fixed detector pixel D j , j = 1, . . ., m × m, is So f n,j is a function depending on u, where u = (u 1 , u 2 , . . ., u 4n ), and the formal dimension of the scatter correction mechanism is d = 4n. If low discrepancy points [19,20] are used for sampling, the mechanism is called as gQMCFFD algorithm. In this paper, Sobol' sequences [23,24] are used for sampling. If random sequences are used, the corresponding mechanism is called as gMCFFD algorithm.
In summary, the new scatter correction mechanism can be summarized in Algorithm 1.
The QMC or MC estimate of the total probability P n i¼1 Pðl i;j Þ is of the form

Numerical experiments
In this section, we first calculate the photon scatter intensities reaching the detector pixel D 256×256 on homogeneous Al phantom, Bone-Tissue (BT) cylinder, Shepp-Logan phantom (SL) [29] and Head phantom to demonstrate the proportion of higher order scatter will decrease. The detector D is composed of 512 × 512 detector pixels, and the detector pixels are denoted as D j , j = 1, . . ., 512 × 512. D 256×256 is the center of the detector D. Secondly, we calculate the X-ray scatter intensities reaching the whole detector D and compare gQMCFFD, gMCFFD, gQMCFRD [21] with gMCFRD [21], campare gQMCFFD with SKS [6] and fASKS [7]. In order to improve efficiency, sparse matrix method is used. The 64 × 64 matrices with interpolation are used instead of 512 × 512 matrices in a simulation. The nomenclatures of different algorithm acronyms are shown in Table 1.
In our numerical experiments, the X-ray source S is a point source with energy spectrum 16 − 120keV. The energy spectrum distribution is shown in Fig 3. The X-ray source S-to-phantom M distance and phantom M-to-detector D distance are both 500mm. The detector D resolution is 512 × 512 pixels with a pixel size of 0.8 × 0.8mm 2 . We use a GeForce RTX 2080 Ti graphics card that is equipped with 4352 processors and 11.0 GB global memory as computational hardware. In simulations, PENELOPE physical database of Geant4 (GEometry ANd Tracking) [30] is employed.
To measure accuracy, we compute the relative difference (RD), e.g., ||r − t|| 2 /||r|| 2 , where t is the scatter intensity computed by gQMCFRD, gMCFFD or gQMCFFD, and r is that computed by gMCFRD [21] or MC-GPU [11], jjgjj 2 ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P m�m i¼1 g i 2 p ; m ¼ 512. To compare the numerical    FOM ¼ 1 where σ is the standard deviation and T (in minutes) is defined as the calculation time and the efficiency improvement factor (EIF) where A is the gQMCFFD (or gMCFFD, or gQMCFRD) algorithm. An EIF greater than 1 implies algorithm A is more efficient relative to gMCFRD. In a simulation, gMCFRD and gQMCFRD use 2 29 source photons, gMCFFD and gQMCFFD use 2 15 source photons. Table 2 shows the probability P(l i,256×256 ) of each i-order scattered path, namely, the i-order scatter intensity that a photon reaches a fixed detector pixel D 256×256 after exactly i-order scatter occurs. In addition, it shows the proportion of the probability P(l i,256×256 ) of each i-order scattered path to the total probability f n,256×256 , where f n;256�256 ¼ P n i¼1 Pðl i;256�256 Þ is the total probability accumulated after n-order interaction in the fixed detector pixel D 256×256 . It can be seen that P(l i,256×256 ) is almost 0 for homogeneous Al phantom and Bone-Tissue cylinder as order i > 10, and for SL phantom and Head phantom as order i > 12 and the f n,256×256 converges to 0.00892 for the homogeneous Al phantom, to 0.00960 for BT cylinder, to 0.00876 for SL phantom, to 0.01209 for the Head phantom. So we take the maximum scatter order n = 10 for homogeneous Al phantom and Bone-Tissue cylinder and n = 15 for SL phantom and Head phantom in simulations.

Scatter intensities
Next, we simulate the scatter intensities f n,j , j = 1, . . ., 512 × 512 at the whole detector D. In a simulation, gMCFRD, gQMCFRD, gMCFFD, gQMCFFD uses 2 29 , 2 29 , 2 15 and 2 15 source photons, respectively. Fig 5(a)-5(d), which is generated by a 512 × 512 matrix, is the scatter intensities received by the whole detector D after the photon interacts with the homogeneous Al phantom using gMCFRD, gQMCFRD, gMCFFD, gQMCFFD, respectively. Fig 5(e) is the scatter intensity profiles of Al phantom through the center of the corresponding detector which parallel to the x-axis, where the black profile, the blue profile, the green profile, the red profile, is calculated by gMCFRD, gQMCFRD, gMCFFD, gQMCFFD, respectively.  good agreement, indicating the accuracy of gQMCFFD. As the resulting images are in good agreement, the number of photons used by gQMCFFD in a simulation is only 1 16384 times the number of photons used by gQMCFRD, which also further reflects the advantages of gQMCFFD. Table 3 shows the running time T (min), standard deviation σ (running one hundred times) of gMCFRD running 2 29 photons, gQMCFRD running 2 29 photons, gMCFFD running 2 15 photons, gQMCFFD running 2 15 photons for homogeneous Al phantom and the relative difference (RD) and EIF which compared gQMCFRD, gMCFFD, gQMCFFD with gMCFRD. Tables 4-6 show the results for Bone-Tissue cylinder, Shepp-Logan phantom and Head, respectively. It can be seen from Table 3 that the RD of gQMCFFD is 1.17% and the EIF of  gQMCFFD is 46.04 with compared to gMCFRD for Al phantom. For BT cylinder, SL phantom and Head phantom, the RD of gQMCFFD is 1.32%, 1.37% and 1.33%, the EIF of gQMCFFD is 38.51, 3.96 and 4.01, respectively. In addition, the EIF of gQMCFFD is much greater than 1 with compared to gQMCFRD and gMCFFD.

Validation with MC-GPU and SKS
Fig 9(a)-9(d) is the scatter intensities after the photon interacts with the SL using MC-GPU [11], SKS [6], fASKS [7], gQMCFFD respectively. Visually , Fig 9(a) and 9(d) are in good agreement. When combining gQMCFFD with the sparse matrix method and running on the GeForce RTX 2080 Ti graphics card, the simulation time of scatter intensity at one angle of the SL phantom can be reduced to 2 seconds and the RD is 3.53% with compared to MC-GPU. The RDs of SKS and fASKS are also present in Table 7. Consistent with the results in Fig 9, SKS and fASKS have relatively large RDs. The RD of SKS is 11.23% and The RD of fASKS is 7.85%. Fig 10(a) is the difference image between the scatter estimates of SL for gQMCFFD and MC-GPU; (b) is the scatter intensity profiles of SL at midline, where the black profile is estimated by MC-GPU, the red profile is estimated by gQMCFFD.

Conclusions
The goals of the paper are to develop and study a new and efficient scatter estimation and correction mechanism for CT. Firstly, the path probability of the interaction between the photon and the phantom is transformed into a high-dimensional integral. Secondly, an efficient scatter estimation algorithm is proposed. We verified the effectiveness and robustness of the gQMCFFD algorithm in homogeneous Al phantom, BT cylinder, SL phantom and Head phantom, and found that gQMCFFD is more successful than gMCFRD. The results are in excellent agreement with RDs less than 1.5% and EIFs are 4 * 46. Finally, we compare gQMCFFD, SKS, fASKS with MC-GPU in the SL phantom to further illustrate the effectiveness and robustness of gQMCFFD. By combining gQMCFFD with the sparse matrix method, the simulation time reduces to 2 seconds and the RD is 3.53%. In this paper, we have transformed the path probability of photons interacting with the phantom into a high-dimensional integral and the high dimensional integral can be solved using QMC simulation and FFD implemented on GPU, with validation against MC simulations. In the future, we will discuss modeling a clinical cone-beam CT (CBCT) system with gQMCFFD and sparse simulation and using the output to remove the scatter intensity from the projection data before reconstruction.